<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
<link rel="stylesheet" href="../style/journal.css" type="text/css" />
<style type="text/css"><!--
.googleadsense {
	margin: 2px;
	padding: 0px;
//--></style><script src="http://www.google-analytics.com/urchin.js" type="text/javascript">
</script>
<script type="text/javascript">
_uacct = "UA-65008-1";
urchinTracker();
</script><title>获取 IRC logger 里的链接</title>
</head>
<body>
<a href="index.html">Journal</a>(2005) | <a href="../blog/"><b>Blog</b></a>(2006) | <a href="http://www.fayland.org/cgi-bin/random_link.pl">RandomLink</a> | <a href="AboutFayland.html">WhoAmI</a> | <a href="LiveBookmark.html">LiveBookmark</a> | <a href="http://www.fayland.org/">HomePage</a>
<p><&lt;Previous: <a href="JS_encode.html">Ajax && encodeURIComponent</a>&nbsp;&nbsp;>>Next: <a href="050519.html">Synopsis localization</a></p>
<h1>获取 IRC logger 里的链接</h1>
<div class='content'>
<p>Category: <a href='Script.html'>Script</a> &nbsp; Keywords: <b>irclogger link</b></p>PerlChina 的 IRC 位于 <a href='irc://irc.freenode.net/perlchina'>irc://irc.freenode.net/perlchina</a><br />
因为我不能长时间在线，又不想错过什么精彩链接，就写了这份代码来解析 <a href='http://koala.ilog.fr/twikiirc/bin/irclogger_logs/perlchina'>IRC log</a> 里的链接。<br />
代码位于 <a href='http://www.fayland.org/scripts/irc_link.pl.txt'>http://www.fayland.org/scripts/irc_link.pl.txt</a>

<h2>代码解释</h2>
首先 IRC log 的地址都是变动的，比如今天 irclogger 的地址为 http://koala.ilog.fr/twikiirc/bin/irclogger_log/perlchina?date=2005-05-18,Wed 明天后缀又会改。<br />
所以我们先用 <a href='http://search.cpan.org/perldoc?DataTime'>DataTime</a> 来拼凑这个 URL.
<pre><code>my $dt = DateTime->now;
my $url = 'http://koala.ilog.fr/twikiirc/bin/irclogger_log/perlchina?date=' . $dt->ymd . ',' . $dt->day_abbr; # like 2005-05-18,Wed</code></pre>
$dt->ymd 得到类如 2005-05-18 格式，而 $dt->day_abbr 得到星期的简写。<br />
<br />
第二是使用 <a href='http://search.cpan.org/perldoc?LWP'>LWP</a> 获取网页，因为这个 irclogger 需要验证，所以代码稍微复杂了点。<br />
<pre><code>use LWP;
use vars '@ISA'; # for get_basic_credentials
@ISA = 'LWP::UserAgent';
my $agent = __PACKAGE__->new;
my $request = HTTP::Request->new(GET => $url);

my $response = $agent->request($request);
$response->is_success or die $response->message;

sub get_basic_credentials {
    return ('perlchina', 'perlchina'); # the perlchina irc log site's username&password
}</code></pre>
这里我们重载了 <a href='http://search.cpan.org/perldoc?LWP::UserAgent'>LWP::UserAgent</a> 的 get_basic_credentials 函数。这样就能通过基本的 Web 验证了。<br />
<br />
最后获取连接，我没有使用现成的模块，而是用强大的 <a href='http://search.cpan.org/perldoc?HTML::Parser'>HTML::Parser</a> 来解析出链接。
<pre><code>use HTML::Parser;

my $parser = HTML::Parser->new(api_version => 3);
$parser->handler(start => \&got_links, 'tagname, attr');
$parser->parse($response->content);
$parser->eof;

sub got_links {
    my ($tagname, $attr) = @_;
    if ($tagname eq 'a' && $attr->{href} && $attr->{href} !~ /^[\/\?]/) {
        print '&lt;a href=\'', $attr->{href}, '\'>', $attr->{href}, '&lt;/a>&lt;br />';
    }
}</code></pre>
$attr->{href} !~ /^[\/\?]/) 是忽略掉以/和?开头的链接（这些是那页面的内部链接）。<br />
详细的代码解释就免了，perldoc 里都有简单的例子。<br />

<p>So, Enjoy</p></div>
<p><&lt;Previous: <a href="JS_encode.html">Ajax && encodeURIComponent</a>&nbsp;&nbsp;>>Next: <a href="050519.html">Synopsis localization</a></p>
<p><strong>Options:</strong> <a href='http://del.icio.us/post?title=%E8%8E%B7%E5%8F%96%20IRC%20logger%20%E9%87%8C%E7%9A%84%E9%93%BE%E6%8E%A5&url=http://www.fayland.org/journal/irclogger_link.html'>+Del.icio.us</a></p>
<strong>Related items</strong>
<ul><li><a href='RandomLink.html'>随机链接</a> < <span class='digit'>2005-06-04 01:00:57</span> ></li></ul>
Created on <span class="digit">2005-05-18 22:09:00</span>, Last modified on <span class="digit">2005-05-18 22:11:17</span><br />
Copyright 2004-2005 All Rights Reserved. Powered by <a href="Eplanet.html">Eplanet</a> && <a href='http://catalyst.perl.org'>Catalyst</a> 5.62.
</body>
</html>